在现实世界中,尽管对该领域的兴趣激增,但在稀疏回报协同环境下进行的加强学习仍然具有挑战性。先前的尝试表明,内在的奖励可以减轻稀疏引起的问题。在本文中,我们提出了一种新颖的固有奖励,该奖励受人类学习的启发,因为人类通过将当前的观察结果与历史知识进行比较来评估好奇心。具体而言,我们训练一个自我监督的预测模型,并保存一组模型参数的快照,而不会产生加法培训成本。然后,我们采用核规范来评估不同快照的预测之间的时间不一致,这可以进一步部署为内在的奖励。此外,提出了一种变异的加权机制,以自适应方式将权重分配给不同的快照。我们证明了所提出的方法在各种基准环境中的功效。结果表明,与其他基于奖励的方法相比,我们的方法可以提供压倒性的最先进性能,而不会产生额外的培训成本并保持更高的噪声耐受性。我们的代码将公开发布以提高可重复性。
translated by 谷歌翻译
外部奖励的稀疏性对加强学习(RL)构成了严重的挑战。当前,对好奇心已经做出了许多努力,这些努力可以为有效探索提供代表性的内在奖励。但是,挑战尚未得到解决。在本文中,我们提出了一种名为Dymecu的RL的好奇心,它代表了基于动态记忆的好奇心。受到人类好奇心和信息理论的启发,Dymecu由动态记忆和双重在线学习者组成。好奇心引起的话,如果记忆的信息无法处理当前状态,并且双重学习者之间的信息差距可以作为对代理的内在奖励进行表述,然后可以将这些状态信息巩固到动态内存中。与以前的好奇方法相比,dymecu可以更好地模仿人类的好奇心与动态记忆,并且可以根据双重学习者的引导范式动态地生长内存模块。在包括DeepMind Control Suite和Atari Suite在内的多个基准测试中,进行了大规模的经验实验,结果表明,Dymecu在有或没有外部奖励的情况下优于基于好奇心的方法。我们将发布代码以增强可重复性。
translated by 谷歌翻译
Reinforcement learning (RL) problems can be challenging without well-shaped rewards. Prior work on provably efficient RL methods generally proposes to address this issue with dedicated exploration strategies. However, another way to tackle this challenge is to reformulate it as a multi-task RL problem, where the task space contains not only the challenging task of interest but also easier tasks that implicitly function as a curriculum. Such a reformulation opens up the possibility of running existing multi-task RL methods as a more efficient alternative to solving a single challenging task from scratch. In this work, we provide a theoretical framework that reformulates a single-task RL problem as a multi-task RL problem defined by a curriculum. Under mild regularity conditions on the curriculum, we show that sequentially solving each task in the multi-task RL problem is more computationally efficient than solving the original single-task problem, without any explicit exploration bonuses or other exploration strategies. We also show that our theoretical insights can be translated into an effective practical learning algorithm that can accelerate curriculum learning on simulated robotic tasks.
translated by 谷歌翻译
In recent years, large amounts of effort have been put into pushing forward the real-world application of dynamic digital human (DDH). However, most current quality assessment research focuses on evaluating static 3D models and usually ignores motion distortions. Therefore, in this paper, we construct a large-scale dynamic digital human quality assessment (DDH-QA) database with diverse motion content as well as multiple distortions to comprehensively study the perceptual quality of DDHs. Both model-based distortion (noise, compression) and motion-based distortion (binding error, motion unnaturalness) are taken into consideration. Ten types of common motion are employed to drive the DDHs and a total of 800 DDHs are generated in the end. Afterward, we render the video sequences of the distorted DDHs as the evaluation media and carry out a well-controlled subjective experiment. Then a benchmark experiment is conducted with the state-of-the-art video quality assessment (VQA) methods and the experimental results show that existing VQA methods are limited in assessing the perceptual loss of DDHs. The database will be made publicly available to facilitate future research.
translated by 谷歌翻译
Data-Free Class Incremental Learning (DFCIL) aims to sequentially learn tasks with access only to data from the current one. DFCIL is of interest because it mitigates concerns about privacy and long-term storage of data, while at the same time alleviating the problem of catastrophic forgetting in incremental learning. In this work, we introduce robust saliency guidance for DFCIL and propose a new framework, which we call RObust Saliency Supervision (ROSS), for mitigating the negative effect of saliency drift. Firstly, we use a teacher-student architecture leveraging low-level tasks to supervise the model with global saliency. We also apply boundary-guided saliency to protect it from drifting across object boundaries at intermediate layers. Finally, we introduce a module for injecting and recovering saliency noise to increase robustness of saliency preservation. Our experiments demonstrate that our method can retain better saliency maps across tasks and achieve state-of-the-art results on the CIFAR-100, Tiny-ImageNet and ImageNet-Subset DFCIL benchmarks. Code will be made publicly available.
translated by 谷歌翻译
Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at https://github.com/google-research/big_vision
translated by 谷歌翻译
Semantic segmentation usually benefits from global contexts, fine localisation information, multi-scale features, etc. To advance Transformer-based segmenters with these aspects, we present a simple yet powerful semantic segmentation architecture, termed as IncepFormer. IncepFormer has two critical contributions as following. First, it introduces a novel pyramid structured Transformer encoder which harvests global context and fine localisation features simultaneously. These features are concatenated and fed into a convolution layer for final per-pixel prediction. Second, IncepFormer integrates an Inception-like architecture with depth-wise convolutions, and a light-weight feed-forward module in each self-attention layer, efficiently obtaining rich local multi-scale object features. Extensive experiments on five benchmarks show that our IncepFormer is superior to state-of-the-art methods in both accuracy and speed, e.g., 1) our IncepFormer-S achieves 47.7% mIoU on ADE20K which outperforms the existing best method by 1% while only costs half parameters and fewer FLOPs. 2) Our IncepFormer-B finally achieves 82.0% mIoU on Cityscapes dataset with 39.6M parameters. Code is available:github.com/shendu0321/IncepFormer.
translated by 谷歌翻译
A common scenario of Multilingual Neural Machine Translation (MNMT) is that each translation task arrives in a sequential manner, and the training data of previous tasks is unavailable. In this scenario, the current methods suffer heavily from catastrophic forgetting (CF). To alleviate the CF, we investigate knowledge distillation based life-long learning methods. Specifically, in one-tomany scenario, we propose a multilingual distillation method to make the new model (student) jointly learn multilingual output from old model (teacher) and new task. In many-to one scenario, we find that direct distillation faces the extreme partial distillation problem, and we propose two different methods to address it: pseudo input distillation and reverse teacher distillation. The experimental results on twelve translation tasks show that the proposed methods can better consolidate the previous knowledge and sharply alleviate the CF.
translated by 谷歌翻译
Efficient ObjectGoal navigation (ObjectNav) in novel environments requires an understanding of the spatial and semantic regularities in environment layouts. In this work, we present a straightforward method for learning these regularities by predicting the locations of unobserved objects from incomplete semantic maps. Our method differs from previous prediction-based navigation methods, such as frontier potential prediction or egocentric map completion, by directly predicting unseen targets while leveraging the global context from all previously explored areas. Our prediction model is lightweight and can be trained in a supervised manner using a relatively small amount of passively collected data. Once trained, the model can be incorporated into a modular pipeline for ObjectNav without the need for any reinforcement learning. We validate the effectiveness of our method on the HM3D and MP3D ObjectNav datasets. We find that it achieves the state-of-the-art on both datasets, despite not using any additional data for training.
translated by 谷歌翻译
Given a few seed entities of a certain type (e.g., Software or Programming Language), entity set expansion aims to discover an extensive set of entities that share the same type as the seeds. Entity set expansion in software-related domains such as StackOverflow can benefit many downstream tasks (e.g., software knowledge graph construction) and facilitate better IT operations and service management. Meanwhile, existing approaches are less concerned with two problems: (1) How to deal with multiple types of seed entities simultaneously? (2) How to leverage the power of pre-trained language models (PLMs)? Being aware of these two problems, in this paper, we study the entity set co-expansion task in StackOverflow, which extracts Library, OS, Application, and Language entities from StackOverflow question-answer threads. During the co-expansion process, we use PLMs to derive embeddings of candidate entities for calculating similarities between entities. Experimental results show that our proposed SECoExpan framework outperforms previous approaches significantly.
translated by 谷歌翻译